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Abstract 

Most performance measures of pilot-assisted multiple-input multiple-output (MIMO) systems are functions that depend on 
both the linear precoding filter and the pilot sequence. A framework for the optimization of these two parameters is proposed, 
based on a matrix-valued generalization of the concept of effective signal-to-noise ratio (SNR) introduced in a famous work by 
Hassibi and Hochwald 1 1 1. The framework applies to a wide class of utility functions of said effective SNR matrix, most notably 
a well-known mutual information expression for Gaussian inputs, an upper bound on the minimum mean-square error (MMSE), 
as well as approximations thereof. The approach consists in decomposing the joint optimization problem into three subproblems: 
first, we describe how to reformulate the optimization of the linear precoder subject to a fixed pilot sequence as a convex problem. 
Second, we do likewise for the optimization of the pilot sequence subject to a fixed precoder. Third, we describe how to generate 
pairs of precoders and pilot sequences that are Pareto optimal in the sense that they attain the Pareto boundary of the set of 
feasible effective SNR matrices. By combining these three optimization problems into an iteration, we obtain an algorithm which 
allows to compute jointly optimal pairs of precoders and pilot sequences with respect to some generic utility function of the 
effective SNR. 

Index Terms 

Channel estimation, mutual information, Rayleigh fading, wireless communications 

I. Introduction 

WHEN the receiver has no genie-provided knowledge of the fading gains, a common approach is to incorporate a pre- 
agreed pattern of training or pilot symbols into the transmitted signal. The receiver first exploits the training observation 
to generate an estimate of the fading gains, and then uses this channel estimate to decode the transmitted message. This two- 
stage approach is suboptimal compared to optimal full-blown maximum-likelihood decoding, but it drastically reduces decoding 
complexity while maintaining near-optimal performance results. We shall consider a narrow-band MIMO channel with time- 
duplexed training, that is, certain time slots are reserved exclusively for transmitting pilot symbols, while other time slots 
are reserved for data symbols. However, much of what is described in this article applies as well to wideband channels with 
frequency-duplexed training (pilot tones). 

In our time-discrete Rayleigh-distributed block-fading model, the statistics of the channel gains are fully described by their 
second-order moments (the covariance of the fading coefficients) and by the fading-block length, also called the coherence 
time. Therefore, when the feedback is limited to being statistical (as in the scenario we shall consider), the pilot sequence and 
the precoder can only be designed based on these two statistical channel parameters. 

A frequent yet suboptimal choice in the literature is that of generic orthonormal pilot symbols. Besides, many publications 
focus on distortion measures like the mean-square error when designing the pilot sequence (e.g., |[3], Q), while focusing on 
other measures such as bit-error rate or mutual information when designing the precoder In light of this situation, it is of both 
practical and theoretical interest to examine what performance gains can potentially be achieved by jointly designing the pilot 
sequence and the precoder, based on statistical channel knowledge and with respect to a single system performance metric. 

In the present article, the metric of choice will be a well-known expression for the Gaussian-input mutual information between 
the channel input and the ouput of a mismatched decoder, which takes the channel estimate as if it were the true channel gain, 
and seeks to minimize the expected Euclidian distance between the received signal and the expected output that would have 
been produced by the candidate codeword. This so-called nearest-neighbor decoder was studied in |5| in the single-antenna 
setting and in |6| for the general multi-antenna case. Predating these publications, the mutual information achieved by this 
decoding scheme was also well-known (in a less general interpretation) as a lower bound on the mutual information between 
the channel input and the raw receiver observation, in systems with imperfect channel-state information at the receiver In 
this weaker formulation, it was originally proposed by Medard |7| and later generalized to MIMO training-based systems by 
Hassibi and Hochwald |1). In numerous variations and different settings, the essence of the bounding technique proposed in 
I?) has been extensively used in subsequent works on transmission with imperfect channel-state information (e.g. in |8|-p3), 
to cite only a few), most often as a performance metric for system design. 

This work was supported by the Spanish Science and Technology Commissions and FEDER funds from the EC (TEC2010-19171/TCM and CONSOLIDER 
INGENIO CSD2()08-00010 COMONSENS), and 2009SGR-1236 of the Catalan government. 
Parts of this work were published in |2|. 
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In |[T|, the problem was considered of finding the optimal time share between training and transmission, as well as optimally 
balancing the training and transmit power levels. Later works have followed a similar approach: in f9], the optimal transmit 
covariance was shown to be diagonal, and its eigenvalues would turn out to be solutions to a convex problem. However, 
in both |T] and all results were derived exclusively for uncorrelated fading. When facing the more difficult — yet more 
realistic — situation of correlated fading, the question of joint optimality of pilot sequence and precoder is much more involved. 
For example, one can intuit that the number of pilot symbols and the number of data streams will depend, among other things, 
on the conditioning of the channel's correlation structure. The authors of | |T2| went about this problem by designing the pilot 
sequence so as to minimize the variance of the channel estimation error by a waterfilling-type algorithm. But evidently, this 
approach is merely heuristic. The present work proposes a framework to tackle this problem optimally. 

We consider a single-user multiple-input multiple-output (MIMO) link and assume a highly scattering environment at 
the receiver — as is the case in many downlink scenarios — so that the fading is correlated only at the transmitter side. 
This encompasses the important special case of fully correlated multiple-input single-output (MISO) links. Additionally, the 
main results of the present work can be generalized straightforwardly to MIMO multiple-access channels with transmit-side 
correlation, when viewing the multiple-access channel as a large MIMO channel with additional block-diagonality constraints 
on the channel correlation, the transmit covariance and the pilot sequence. 

In |[T], the concept of effective signal-to-noise ratio (SNR) was introduced to designate an SNR that accounts for the 
imperfection of channel state information (CSI) at the receiver. The mutual information for Gaussian inputs (which in |[T| 
is interpreted as a capacity lower bound) is an increasing function of this effective SNR, which thus serves as the figure of 
merit to be maximized. In the present work, we follow a similar hne of thought, however, we treat the more general case of 
correlated fading, for which the definition of the effective SNR needs to be extended from a scalar to a matrix-valued quantity. 
Hence, the concept of effective SNR maximization needs to be extended accordingly to a Pareto optimization. Among all 
Pareto optimal solutions, the optimum will be determined by the specific choice of the utility function. 

A procedure is proposed by which the non-convex joint pilot-precoder optimization problem is decomposed into three 
subproblems, each of which can be cast into a convex optimization problem. An iteration cycles through these three optimization 
steps to compute the joint optimum: the first step consists in optimizing the precoder while keeping the pilot sequence fixed, 
the second step consists in optimizing the pilot sequence while keeping the precoder fixed, and the third step adjusts the 
pilot-precoder pair so as to be Pareto optimal in terms of the matrix-valued effective SNR. 

One main result in the analysis of the joint optimization problem is that the left singular vectors of the precoder and of the 
pilot sequence matrix must be eigenvectors of the channel covariance matrix. Loosely speaking, this means that the training 
symbols and the multiple beamforming vectors should be aligned in direction of the channel eigenmodes. 

The article is structured as follows: Section|ll]defines notation; Section III describes the system model; Section IV defines and 
motivates the class of utility functions considered in the optimization framework; Section [V] states the optimization problem 
to consider; Sections IVI] and |VII| describe how the optimization of the precoder (resp. pilot sequence) subject to a fixed 



pilot sequence (resp. precoder) is cast into a convex problem; Section |VIII| specifies the jointly optimal training and transmit 
directions and shows how the residual problem of computing pilot-precoder power loading vectors that are Pareto optimal in 



terms of the effective SNR, can be formulated as a quasi-convex problem; Section IX assembles the findings from Sections VI 



VII and VIII into an iterative algorithm that achieves the jointly optimal pilot-precoder design. 



II. Notation 

The operators (•)^, (•)* and {u)^ denote the transpose, the complex conjugate, and the conjugate transpose (Hermitian 
adjoint) of a matrix, respectively. Matrix square roots are denoted as {9)2 and are assumed to be Hermitian. The Moore- 
Penrose pseudoinverse of a matrix A is uniquely defined by the four identities 

AA+A = A A+AA+ = A+ 

{AA+)^ = AA+ {A+A^ = A+A. 

The range of a matrix A, denoted as range(A), shall be the linear space spanned by its columns. The set of columns of a 
matrix A is denoted as C{A). 

The trace and determinant of a square matrix are written as tr(») and dct(»), respectively. 

We will occasionally use the entrywise comparison a < b between two real-valued vectors a and 6, of i-th entries and 
bi, defined as a < b <^ \/i : Ui < bi. If A and B denote Hermitian matrices, A ^ B (resp. A -< B) means that A B is 
positive semidefinite (resp. positive definite). 

For a set of real vectors X C M", the so-called Pareto border d^X Q X contains those points from X which are not 
dominated by any other point from X, in the sense that for any point a;+ e X, there is no a; e A" distinct from a;+ such 
that X > a;+. 

The expectation of a random variable is denoted by £[•]. If the distribution of a complex random vector x is proper Gaussian, 
we write x ^ JVc{x, Rx), where x = E[x] and = E[{x — x){x — x)^] stand for the mean and the covariance of x, 
respectively. 
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We denote by U"^^" C C"*^" the set of (sub-)umtary complex matrices defined by 

. UU^ = I if m < n 

. U^U = 1 if m>n. 
We denote by P" = U"^" fl {0, the symmetric group of permutation matrices. 

Unless stated otherwise, Ua denotes the reduced left singular basis of a matrix A. For A Hermitian, Ua is thus the reduced 
eigenbasis, with the number of columns equal to the rank of A. 

The cone of positive definite (resp. positive semidefinite) matrices from C"^" is denoted C"^" (resp. C"^"), and is formally 
defined as: 

^nxn ^ 1^ g C"Xn | Va; £ C" : Ax > O} 
^nxn = 1^ e C"''" jva; e C": x^ Ax > O}. 
Subsets of the non-negative orthant whose elements sum up to a value not larger than a, will be denoted 2? (a): 

V{a) = {xeRll l'^^ ^ 

The dimension of ^{a) (n in the above case) will be clear from the context. Usually, the dimension will be equal to the 
system's number of transmit antennas. 

The vectorization operator vcc{A) takes a matrix A = [ai, 02, . . . ] as argument, and returns a vector vec{A) = [aj , , • • • 
containing the colurmis of A stacked on top of each other. 

III. System Model 

Our system consists of a standard single-user MIMO link with an Nr x A't channel matrix H expressible as 

H = WRK (1) 

where the entries of W e £NrxNt ^^.^ independent and identically distributed (i.i.d.) zero-mean circularly- symmetric unit- 
variance complex Gaussian, i.e., vcc(VK) ~ J\fc{0,l). The matrix W is the white random component of the channel matrix, 
whereas R = j^E[iJ^i?] is the deterministic component and represents the transmit-side correlation. The latter is assumed 
as fuU-rank, since we ignore keyhole effects. This correlation model is vahd in setups where numerous scatterers are located 
in the vicinity of the transmitter, and notably subsumes the case of correlated multiple-input single-output (MISO) channels, 
which are particularly relevant in wireless downlinks. 

The channel remains constant for a duration T called the channel coherence time, after which it changes to a new realization 
that is independent of all previous ones (block-fading). Within every such fading block, we reserve Tj. time slots to transmit 
a sequence of pilot symbols known at the receiver, while the data is transmitted during the remaining T — Tr time slots. 
Without loss of generality, we can acconraiodate the pilot symbols into the first time slots of each fading block. During 
data transmission phases, the received signal at time instant k is 

y(.k) ^HP^ik) _^_^{k)^ k = Tr + l,...,T (2) 

where x^''^ ~ AfciO, 1^) is a r x 1 vector containing Gaussian inputs multiplexed into r independent substreams, F e C^''^'' 
is the linear precoder, and 2;*^'^^ ~ A/c(0,I) is a normalized independent additive Gaussian noise term. The Gram matrix 
Q = FF^ represents the covariance of the transmit signal Fx^^\ and is thus called the transmit covariance. We assume that 
the x^^^ and z^^^ are i.i.d. across the time index k. During training phases, a sequence T = [t^^\t^'^\ . . . ] € c^txt^ 
pilot symbols is sent. At time instant k, the receiver observes 

y^'^) =ift('=)-F2:W, k = l,...,T^. (3) 

The noisy training observations y'^^^ are stored in a matrix = [y^^^y^^^. . . .]. It can be shown that the MMSE channel 
estimate H is obtained by right-multiplying Yt with the estimator matrix G = [T^RT + T)~^T^R: 

H = YrG. (4) 

As a consequence of the correlation models for H and z, the respective marginal distributions of the estimate H and of the 
estimation error H = H - H turn out to be h = vec(i?) ~ A/c(0, R^ 'Eil) and h = Yec{H) ~ A/c(0, R^ (g) I) with 
transmit-side covariances 

R = ^E[H^H]= R- R (5a) 
R=^ E[H^H] = {R-^ + P)-\ (5b) 

where P = TT^ denotes the Gram matrix of the pilot sequence matrix T, and shall from now on be called the pilot Gram. 
Note that we can write 

H = WR^ H = WRi (6) 
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with vec(W') - 7Vc(0, 1)^ and vec(W') - 7Vc(0, 1). 

The error covariance R is non-singular by construction, whereas for R, the following always holds: 

rank(^) = rank(P). (7) 

This rank equality is easily seen by application of the matrix inversion lemma: denoting by P = UpApUp the reduced 
eigendecomposition of P, where Ap is diagonal full-rank of dimension rank(P) x rank(P), and Up G ([^Njxrank(P) jj^^ 
orthonormal columns, we have 

R^R {R-^ + UpApUl)-^ 

= RUp{Ap^ + U^pRUp)-^U^pR. (8) 
Since RUp has full column rank, it becomes manifest that we always have 

rank(^) = rank((ylp^ + U^RUp)-'^) = rank(P). (9) 

IV. Utility Functions 

A. Matrix-valued effective SNR 

Omitting the time index k for notational concision, we rewrite the system equation (|2| as 

y = HFx + HFx + ^ (10) 

^eff 

The first term HFx represents the useful signal portion of the observation y, while the remaining term z^.^ = HFx + z 
is a non-Gaussian noise term, uncorrected with (but not independent of) the input x. This noise term is commonly called 
effective noise By treating the effective noise as if it were independent of the input, we incur into a suboptimality in 
estimating/decoding x. Note that the effective noise has a covariance 

E[Zeff4f] = (1 + tliF^RF))lN, " ^effliVR- (11) 

We whiten the random channel and normaUze the transmit signal and effective noise by scaling and rewriting {TQ\ as 

^ y = WKx + z,fi, (12) 

where Zeff is defined as Zgff — ^esy^^/'^ea the matrix K is defined as 

R^ F 

. ^ =. (13) 
y^l + tr(FtfiF) 

If the receiver attempts to generate an estimate x of the transmit symbols x, he may do so by minimizing the mean-square 
error (MSE) conditioned on the receiver side information Yr and on the observation Y. This MSE is the trace of the MSE 
matrix 

mse[g]=E[{x-x){x~x)''\Yr,Y], (14) 

wherein the estimate x = g(Y^,l^) is some deterministic function of the side information Y^- and observation Y. It is well 
known that this MSE functional takes its minimum (the minimum MSE, in short MMSE) when g is the conditional mean 
estimator (CME), i.e., 

x^gcMEiYr,Y)^E[x\Yr,Y]. (15) 

But this estimate being difficult to compute exactly in our channel model, we content ourselves with the (suboptimal) linear 
minimum mean-square estimate (LMMSE) 

X = .gLMMSEC^^r, = GlmMSeC^^t)?/ (16) 

wherein the linear estimator GLMMSE(^r) reads as 

Gr^MMSE{Yr)=E[xy^Yr] E[yyt|i;]-i 

= K''W''{l + WSW''y\ (17) 
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where the matrix S, which is the Gram matrix of K, i.e., 

S^KK^ = ^^, (18) 
1 + tr(Qi?) 

represents the matrix-valued effective SNR The above LMMSE estimate ([T6| is suboptimal in the sense that it yields an 
MSE that is larger than the actual MMSE, i.e., mse[gLMMSE] h rnse[gcME] — mmse, and reads as (cf. Utility 9 in Table |l] 
Appendix [B] ) 

mse[5LMMSE] = I,- - K^W^l + WSW^y^WK. (19) 
The average scalar mean-square error achieved with said LMMSE symbol estimator is thus 

E[||a;-i||^] = Etr(mse[gLMMSE]) 

= r- trE[(I + WSW'^y^WSW''] 

= r - NE. + tTE[{I + WSW^y^]. (20) 

This upper bound on the average data symbol MMSE constitutes a basic performance metric of the considered MIMO system. 

Another important figure of merit, besides this MMSE bound, is the input-output mutual information of the channel. Denoting 
the differential entropy as and following the same lines as in the derivation found in 1 ,14) , the input-output mutual 

information I{x;y\YT-) can be lower bounded as 

I{x-y\Yr) > I{x;GLMMSE{Yr)y\Yr) 

= h{x) - h{x\GLMMSE{Yr)y,Yr) 
= h{x) - h{x ~ GLMMSE(>V)y|^T) 

> h(x) - Elogdet(7remse[gLMMSE])- (21) 

Here, the first inequality is the data processing inequality, while the second inequality comes from upper-bounding the entropy 
h{x — Gymmse0^t)v\Yt) by the entropy of a Gaussian variable of same covariance. By inserting ([T9| into pT| ), this mutual 
information lower bound reads as 

E log det (I + W'S'W'^) < I{x-y\H). (22) 

This bound has been widely used and studied in the literature, e.g., ||7|, |[l), p3] , | [T6| . It was generalized in ||5| (for the 
single-antenna case) and \S\ (for the multiple-antenna case), where the authors showed that this lower bound is in fact the 
mutual information between the input signal and the output of a nearest-neighbor decoder based on the Euclidian distance 
metric in the received signal space. Henceforth, we shall denote by / this mutual information, or by I{S) whenever we interpret 
it as a function of the effective SNR: 

I{S)=mogdct{l + WSW^). (23) 

This mutual information will be the main figure of merit that we seek to maximize. 

Notice that both figures of merit presented above, namely the MMSE bound ( |20] i and the mutual information ( |23] l, depend 
only on S, which in turn depends on the linear precoder F and on the training sequence T via their Gram forms alone, that 
is, the transmit covariance Q = FF^ and the pilot Gram P = TT^. Thus, we may occasionally write S = S{P,Q) to 
emphasize this dependency. The matrix S plays a central role in all subsequent considerations, since it concentrates all system 
parameters (the channel covariance R, the pilot Gram P and the transmit covariance Q) into a single matrix. 

Said matrix S constitutes a matrix-valued generalization of the scalar effective SNR introduced by Hassibi and Hochwald 
in Much in the same way as the authors do in we will seek to maximize this effective SNR (in a Pareto sense, to 
be specified later). Evidently, 5' increases in the sense of matrix monotonicity when scaling up the pilot energy ( |24a| l or the 
transmit power ( |24b| ): 

Q<k<k' ^ S{kP, Q) -< S{k'P, Q) (24a) 
Q<k<k' ^ S{P, kQ) -< S{P, k'Q). (24b) 

Proof: See Appendix [A| ■ 
The monotonicity in P even holds in the stronger sense 

0^P^P'^5(P,Q)^S(P',Q), (25) 



yet this is not true for the monotonicity in Q. 
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B. General utility functions 

For a matrix X, let A(X) denote the vector of non-increasingly ordered eigenvalues of X. We call \{X) the (eigenvalue) 
profile of X. Since has i.i.d. circularly-symmetric complex Gaussian entries, it is invariant against unitary rotations, i.e., 
W and WU have the same marginal distribution for any unitary U . Therefore, the function / is invariant against unitary 
transformations: 

I{S) = I{U''SU). (26) 

It is thus a symmetric function of the profile of S, which we shall denote as s = Henceforth, we may write I{s) or 

I{S) without distinction. 

There exists a number of other physically meaningful examples of utilities besides the mutual information I{s) that are 
functions of the profile s. Such functions constitute a class T of utilities (formally defined below) and can result from different 
design goals, optimization criteria, asymptotic or heuristic approximations of utilities, etc. Any utility function F E T shares 
two essential properties with /, namely, that it should be matrix-monotonic and invariant against unitary transformations, as 
put forth in the formal definition below. 

Definition IV.l. A function F: C"^" — >■ M belongs to the class J- if it is matrix-monotonic and invariant against unitary 
transformations, i.e., if the following two conditions are met: 

{q<S<S' ^F{S)<F{S'), 

Note that the invariance against unitary transformations implies that functions from the class F are actually functions of 
the set of eigenvalues of S alone, the eigenbasis of S being irrelevant. Therefore, instead of defining the class F based on 
matrix-to-scalar functions, we can equivalently define the class F based on vector-to-scalar functions. 

Definition IV.2. A function f : M" — >■ M belongs to the class F if it is vector-monotonic and symmetric (permutation-invariant), 
i.e., if 

fo<5<s'^/(s)</(s'), .28) 
IVJTeP": /(iTs) = /(s). 



Both definitions IV. 1 and IV.2 provably characterize the same set of functions. Though the two definitions apply to different 
types of functions (matrix-to-scalar vs. vector-to-scalar), we use the same letter F to denote both sets. Which of the two is 
meant will always be clear from the context. 

Besides I{S), two other simple examples of utility functions from the class F are 

det(5) = n^* tr(5) = ^s„ (29) 

i=l 1=1 

where the Si denote the entries of s. More examples are given in Table |l] in Appendix [B| Below the table are included some 
brief explanations that motivate the use of most of the utilities listed. 

V. Problem Statement 

For a fixed coherence time T and training duration T^, let us define the compact set of admissible values of the pilot-precoder 
pair {P, Q) as 

VQ= \{P,Q) C.Cf^'' xCl^''^'' tr(P) + (T-rOtr(Q) < T^}. (30) 

Here, the pilot energy tr(P) and the transmit power tr(Q) are related via the energy conservation equation 

tr(P) + (T - T,) tr(Q) <T/i, (31) 

where the scalar fi stands for the maximum average energy consumption per time unit of the system. 

If the training duration Tt- is also subject to optimization, the full-fledged problem of joint pilot and precoder optimization 
reads in its most general formulation as 

T — T 

max max -—^f{s{P,Q)). (32) 

where the output value of the function / represents a utility per data (non-training) channel use, and the factor accounts 
for the loss due to the time invested in channel estimation. Accordingly, the quantity ^^r^^ f{s{P, Q)) represents the average 
utility per channel use. 
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In |[T|, the authors postulate for a similar setup that the receiver should have a representative estimate of the complete channel 
state, described by NjNr fading coefficients. Therefore, they assume that the training duration Tr should be at least the number 
of transmit antennas Nj, so as to generate at least as many observables as there are coefficients to estimate. However, in the 
case where only a hmited number of data streams are to be precoded, it might be more economic to only estimate a properly 
chosen subspace of the channel covariance spanned by the stronger eigenmodes. In fact, since Tr is defined as the number 
of columns of the pilot matrix T, and given that all utility functions and constraints depend on T only via its Gram matrix 
P = TT^, we can set the training duration equal to the rank of P, i.e., Tr — rank(P) < iVr, and accordingly reduce the 
search interval in (|32|) from {1, . . . , T — 1} down to {!,..., min(T — 1, iVx)}. In any case, the optimization over Tr is over a 



finite set and can be solved by an exhaustive search. Therefore, we will leave this problem aside until Section IX and focus 
in the meantime on the inner problem: 

max f{s{P,Q)). (33) 

In the next two sections, based on Problem ( |33| ), we will treat the partial problems that consist in optimizing one among the 
two variables P and Q, while the other variable has a constant value. These individual optimizations will be two components 
of an algorithmic approach that aims to solve the joint problem ( |33| ). However, they may also be considered as two stand-alone 
problems in their own right. 

VI. Precoder Design for Prescribed Pilots 

In this section, we consider the optimization of the transmit covariance Q alone, while the pilot Gram P has a fixed value. 
In a first approach, we will keep the matrix notation F{S) instead of the equivalent vector notation f{s), as we will first 
investigate the problems in the matrix domain. The problem at hand reads as 

Q*{P) = argmaxF(S'(P,Q)) (34) 
QeQ 

where the search set Q is bounded by a trace constraint 

Q - {q e C^'^^^^ : tr(Q) < ^q} . (35) 

The constant fiQ may be computed from the energy conservation relation ( |3T| as /ig = '^^^p^'^^^ . It may as well be considered 
as some arbitrary constant. 

A. Preliminaries 

Prior to delving into analytical derivations, it is instructive to take a glance at how S{P, Q) depends on its second argument 
Q, in order to get to grips with the optimization problem at hand. In the expression of the matrix-to-matrix function 

Q^S{P.Q)= , (36) 

^ ' l + tr(Qi2)' 

we see that the argument Q appears in the matrix-valued numerator, and inside a trace operator in the denominator This 
function Q i-^ S{P, Q) is thus reminiscent of fractions of monomials such as g i— > y^^, except that it is defined for matrices. 



In fact, the function Q i— > S{P,Q) pertains to what can be defined in the following Definition VI. 1 as a generalization of 



Unear fractional functions. The latter are commonly defined for the scalar case (e.g., 1 17 Sec. 2.3.3]) 



Definition VI.l. Let X C C"^" denote a set of Hermitian matrices of size nxn whose elements X (z X satisfy tv{BX) ^ —1 
with some given Hermitian matrix B € C"^". A function X i— 4i{X\ A, B) that is defined as 

AX 

X ^ C™-", X ^ 0(X;A,B) = ^^^-^^ (37) 

shall be called a linear fractional /wncf /on with parameters A E C™x" and B £ C"^". 

Note that the Hermitianity of B and of the argument X ensures the Hermitianity of the image 4>{X; A, B). Linear fractional 
functions may or may not be injective functions, depending on the properties of the parameter A. 



Lemma VI.l. The linear fractional function X (f>{X ; A, B) from Definition VI.l is injective (one-to-one) if one at least 
of the following two conditions apply: 

1) The parameter A has full column rank 

2) The parameter A has full row rank and the domain X is such that yX G X : rangc(X) ~ range(A^)Q 
In these two respective cases, its inverse function (j)"^ : 4'{X; A, B) X ,Y (fi^^iY; A, B) is 

' In case A has neither full column nor full row rank, one can bring the problem back to one of the two considered cases by an appropriate rank reduction. 
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1) linear fractional with parameters and —A^'''BA^, where A^ = [A^ A) ^A^ denotes the left pseudoinverse of A, 
i.e., (j)-'^ A, B)^ (!){•■ A^-A^'^BA^). 

2) linear fractional with parameters A^ and —A^'^BA}', where A!' — A^(AA^)^^ denotes the right pseudoinverse of A, 
i.e., A, B) = (/)(•; BA"). 

Proof: See Appendix [C] 



In the following we will optimize 5(P, Q) rather than Q directly, and thus the above Lemma VI. 1 can be used for computing 
the optimal transmit covariance Q from the optimal 5', by means of the appropriate inverse hnear fractional function. 



B. Convexity of the set of feasible S for prescribed pilots 

Prescribing the pilot Gram P means that the matrices R = {R^^ + P)^^ and R = R R are prescribed. Therefore, the 
function Q i— > S{P, Q) as given in ([36| is linear fractional with parameters A — R^ and B — R, i.e.. 



S{P,Q) = ^iQ;R^,R). 



With this new notation, the problem ((34]i reads as 



Q*{P) = argmaxi^((/i(Q;fi5,fi)). 



(38) 



(39) 



The key property of linear fractional functions that we need for understanding Problem p9| l is that they preserve the linearity 
of segments. 

Lemma VL2. An injective linear fractional function ip{») — ^(•; A, B) with some given parameters A and B uniquely maps 
linear segments onto linear segments in a one-to-one manner, i.e., 

V(Xi, X2, a) e X [0; 1], 3/3 e [0; 1] : ifi{aX^ + (1 - a)X2) = /3^(Xi) + (1 - PMX2). 



Proof: This is readily verified by inserting the explicit value 

a(l+tr(BXi)) 



(40) 



(41) 



1 + a tr(BXi) + (1 - a) tr(BX2) 

into the equality ( |40l ). ■ 
Figure [T] symbolically depicts the behavior of linear fractional functions: a convex combination of two points is mapped 
onto a convex combination of the respective images of said points, thus preserving segments. They are not linear functions 
though, because a and /3 can be different. 




Fig. 1. Linear fractional functions preserve segments 



Corollary VI.l. Linear fractional mappings preserve set convexity. 

Proof: Take a pair (Xi, X2) G with a convex X. According to Lemma VI. 2 any convex combination of Xi and X2 



is mapped onto a convex combination of ^p{Xi) and Lp{X2). Therefore, the codomain (p{X) is convex. ■ 
As a consequence, S{P. Q) is a convex set because Q is convex [cf. ((35]l]. So if a utility F is concave in S, then Problem 
( |34] i, which may be rewritten in the S'-domain as 

S*{P) = argmax F{S), (42) 
is convex. The optimal transmit covariance Q*{P) is then computed from S*{P) by means of the appropriate inverse linear 



fractional function (cf. Lemma VI. 1 1. More generally speaking, if F is quasi-concave in S, then the problem ( [34| l can be recast 
into a convex problem by an appropriate transformation. Even if F is only unimodal on S{P, Q) — that is, it has a single local 
maximum on the convex compact S{P, Q) — one can still optimize it efficiently via bisection. The mutual information / is 
one example of a concave utility. Other examples of concave or log-concave (quasi-concave) utilities are given in Table |l] in 
Appendix |B] 
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The next theorem specifies an important property of the range space of the optimal Q*{P). 

Theorem VI.l. For any utility F ^ T and a prescribed pilot Gram P, the range space of the optimal transmit covariance 
Q*(P) must be contained in the range space of the channel estimate covariance R: 

range(Q*(P)) C range(^). (43) 

Proof: See Appendix [P] ■ 
Note that, together with Identity (|7]), Theorem |VI.1| directly implies the rank inequality 

rank(Q*(P)) < rank(fi) = rank(P), (44) 

or in words, 

number of streams < number of pilot symbols (45) 



The idea behind the proof of Theorem VI.l is that, if Q*{P) had eigenvectors (transmit directions) lying outside the range 
space of the estimate covariance R, then the transmitter would be radiating some of its transmit power into channel directions 
of which the receiver has no estimate (and thus cannot detect coherently), thus incurring a waste of power. As a particular 
consequence, ( |44] l tells us that the number of precoded streams should never exceed the number of training symbols. 

C. Convexity of the set of feasible s for prescribed pilots 



By virtue of the equivalence of Definitions IV. 1 and IV.2 we may rewrite Problem (|34)l as 



Q*(P) = argmax/(s(P,Q)), (46) 



or alternatively, in the s-domain [compare with (|42])] as 



s*{P) = argmax f{s). (47) 
ses{P,Q) 

We are now focusing on the eigenvalue profile s instead of the matrix S, though both problem formulations [matrix-based 
( |34] i and vector-based (|46]l] are in fact equivalent. In the previous subsection, we have shown that the set S{P, Q) is convex. 
Note that this convexity, however, does not generally imply (nor is implied by) the convexity of the set of eigenvalue profiles 
•s(P, Q)- Nevertheless, it turns out that s(P, Q) is also convex and has a simplex shape, whose vertices are characterized by 
Theorem IVI.2I below. 

Let Ldi denote the non-increasingly ordered eigenvalues of the generalized eigenvalue problem 

Rv,^lu,{hq^I + R)v,. (48) 
Due to rank(P) — rank(P) [cf. (|7]i], only the first rp = rank(P) eigenvalues uJi are different from zero. 
Theorem VI.2. The set [cf ^] 

Q e C^'^''^^ tr(g) < pq\ (49) 



siP,a}-\x(^^] 

' ' ' 1 ll + tr(QP)i 



is a simplex given by the convex hull of the origin cr^^^ ^ and of the rp linearly independent points 

n 

cr(") =H(wi,...,w„)^e,, ne{l,...,rp} (50) 

where [ci, . . . , ej^j] — I is the canonical basis, and . . . , a;„) = (X^iLi ■^r^)"^ ^^^^ ^ arguments Xi, . . . ,Xn denotes 

the harmonic mean thereof divided by n. 

Proof: See Appendix IE] 



As a byproduct, the proof of Theorem VI.2 reveals that if the set of eigenvectors of R is contained in the set of eigenvectors 
of R, i.e., C(Uff) C C{Uff), then it is optimal with respect to any utility F E T that the eigenbasis Uq^p^ of the optimal 
matrix Q*{P) be chosen such that as 

C(J7q) c C{Uj,). (51) 

Note that this requirement is stronger than the range space inclusion property of Theorem |VI.1| [cf. ([43])]. This particular 
situation of eigenbasis alignment C(J7^) C C{U^) occurs, for example, when 

• using A'x unitary pilots (i.e., P — ^^'^^Ijv^ is a scaled identity matrix) 

• the channel gains are independently and identically distributed (R = I) 
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S2 




Fig. 2. Sketch of a simplex set s(P, Q). The so-called Pareto border 9+s(P, Q) contains those points from s{P, Q) that are not dominated by any other 
point from s{P, Q), and is the convex hull of tr^"' for n £ {1, . . . , rp} (excluding the origin). 



• the channel estimation error vanishes (R = 0, Ft = R) 

• the pilots are aligned with the channel covariance, i.e., C{Up) C C{Ur). 

As we shall see later in Section VIII the latter condition C{Up) C C{Uii) is in fact necessary for joint optimality of P and 

Q- 

VII. Pilot Design for a Prescribed Precoder 



To complement the previous Section VI we will now swap the roles of P and Q and consider the optimization of the pilot 
Gram P under a trace constraint, while the transmit covariance Q has a fixed value. This problem reads as 

P*{Q) = argmaxF(S'(P,Q)) (52) 

with a search set 

■P = {P e c^TX^T . tj.(p) < , (53) 

or alternatively, in the S-domain, 

S*{Q) = argmax F{S). (54) 
ses{v,Q) 

The constant fi-p may be computed from the energy conservation relation ([31) as /ip = Tfi — (T — T^) tr{Q), or it may be 
considered as some given constant. 

Finally, in analogy to the rank inequality ^44\ between P and Q*{P), which follows from Theorem VI. 1| and applies to 
Problem ([34]l, we also have a corresponding rank inequality for Problem 



Theorem VII.l. For any utility F (z J- and a prescribed transmit covariance Q, the rank of the optimal pilot Gram P*{Q) 
is not larger than the rank of Q: 

mnk{P*{Q)) < rank(Q). (55) 

Proof: See Appendix [F] ■ 
In words, we can state this as [compare with ( |45] l] 

number of streams > number of pilot symbols (56) 

The interpretation behind this rank inequality is that, if there were more orthogonal training directions than there are data streams 
precoded, we would necessarily be wasting some pilot energy into directions that are not used for transmission anyway. 
Next, we will show that the set 3(1^, Q) is convex. We write out as R R, then S reads as [cf. ( [T8] l] 

S = ^^Q^' (57) 

1 + tr(Qi?) - tr(QP) ' 



which is unitarily equivalent to 

S' = 

1 + tr(QP) - tr(QP) 



S' = Q13Q^_^^ (58) 
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since the Hermitian matrices R^QR^ and Q^RQi have the same eigenvalues because of the identity X{AB) = X{BA). 
Due to the invariance property in Definition IV. 1 the matrices S and 5" yield the same utility, i.e., F{S) — F{S') for any 
F G J^, so they can be used interchangeably. Let us further abbreviate 1 + ti{QR) as t so we get 



S' 



Q2RQ2 

T-tY{QR)' 



(59) 



By comparing Expression ( [59[ l with with the definition of linear fractional functions (cf. Definition VI. 1 1, we identify 5" as a 
Unear fractional function of R with parameters A = -^Qs and B = — 7Q, i.e.. 



S'iP,Q) = ^(R:^QK-^Q 



(60) 



VI. 1 



that 



In Appendix [g] we show that the set of feasible R is convex, from which follows immediately with Corollary 
S'{V,Q) is a convex set. Provided that the utility F is concave, quasi-concave or unimodal. Problem ( |54] l is convex in the 
domain of 5", and as such, can be solved efficiently with convex optimization methods. |^ 



VIII. Jointly Pareto Optimal Pilot-Precoder Pairs 

A. Problem statement 

When restated in the domain of feasible profiles s [cf. (|28|], the original problem ( |33] l reads as 



max fis) 

ses(VQ.) 



where the feasible set s{VQ) is 



A'PQ) = {s{P,Q) e I {P,Q) e Vq} 



(61) 



(62) 



and VQ was defined in ( |30| l. Furthermore, we can exploit the monotonicity of utilities f £ J- [cf ( [24| i] to restrict the search 
set s{VQ) to its Pareto border alone: 



max /(s). 

sed+s{VQ) 



(63) 



Said Pareto border d^s{VQ) consists of Pareto optimal points, i.e., points in s{VQ) that are not dominated by any other 
point in s{'PQ), or in mathematical notation: 



d+s{VQ) = y e s{VQ) $s" e s{VQ) : s" > s' with s" ^ s'j. 



(64) 




The practical computation of the joint global optimum ( [M) depends mainly on the properties of the utility function / that 
we are considering. In fact, whether the problem at hand is convex, non-convex, quasi-convex, etc., depends on the function 
/ and possibly also on the values of R and T, hence there cannot be a generic optimization procedure that is guaranteed to 
converge to the global joint optimum. Instead, the problem must be analyzed case-by-case for every utility function and set of 
parameters. However, there exists an important subproblem of ( |6T] l that is common to all utility functions of the class T and 
can be generally solved, as we shall see: the computation of the search set d^s{'PQ). The present Section 
this problem. 



VIII 



deals with 



^As regai'ds the optimization P*( Q) in the eigenvalue domain, it turns out that, unlike for the set s{P,Q) studied in the previous section [vi| and 
characterized as a simplex in Theorem |VI.2| there does not seem to exist a comparably simple analytic characterization of the set s(P, Q). 



12 



B. Number of pilot symbols and number of streams 

The joint problem ( |63] l can be decomposed in an outer optimization (which we shall call energy boos^ that consists in 
finding the optimal balance between the pilot symbol energy and the data symbol power, and an inner optimization of s over 
a set s{V, Q): 



max < max /(s) > . (65) 

MP,A'Q>0 \sed+s(V.Q) I 

Note that the inner optimization is over the set d^s{'P, Q) and not over the set s{VQ) as in ( |63l ): here, the sets V and 
Q are understood to be the trace-constrained sets as defined in ( |53| ) and ( (35] l, respectively. The inner optimization inside the 
braces of ( |65| ) can as well be written as 

(P.Q)l^xS = ^1^ } 

= max{/(s(P*(Q),Q))} (66) 

with Q*{P) and P*{Q) defined by ([34]l and ( |52j l, respectively. The jointly optimal pilot-precoder pair (P*, Q*) must therefore 
simultaneously fulfill the rank inequalities (|44]i and ( |55] l (the latter being set forth by Theorem VII. 1 1, from which follows 



that P* and Q* must have equal rank at the joint optimum. Since this rank equality holds regardless of the value of the 
pair (/ip,/ig), it also generally holds for the optimal pair {P*,Q*) in Problem ( |65| ). Since said rank equality holds also 
independently of the value of Tr, it also holds for the full-fledged problem ( |32) l (with training overhead taken into account), 
so that we can state that [compare with (|45|, {56^ ] 

number of streams = number of pilot symbols (67) 

is a necessary condition for a pilot-precoder pair (P, Q) to be jointly optimal for Problems (|66|, ( |6T] l, and (|32]l. 

C. Jointly optimal transmit and training directions 

A fortunate circumstance when treating the joint problem (|33|)/(|6T]l is that the jointly optimal transmit and training directions 
have a very simple and intuitive characterization, enunciated in Theorem |VIII.l below. Let us rewrite Problem ( |63| ) like in 



( |65| l, and only consider the inner optimization problem inside the curly braces ( |65| ), namely 

max /(s), (68) 

which is the pilot-precoder joint optimization problem without energy boost. 

Let the channel covariance R, the pilot Gram P and the transmit covariance Q have the following (reduced) eigendecom- 
positions: 

R = UrArUI, P = UpApUl, Q = UqAqU^,. 

Without loss of generality, we assume that the eigenvalues of R are arranged in non-increasing order on the diagonal positions 
of Ar, whereas the eigenvalues of yip and Aq are not sorted in any specific order 



Theorem VIII.l. For any utility f £ J-, in the joint optimization problem ( |68[ ), there is no loss of optimality in setting the 
eigenvectors of the pilot Gram P (i.e., the left singular vectors of the pilot sequence T) and the eigenvectors of the transmit 
covariance Q (i.e., the left singular vectors of the precoder F) to be a common subset of the eigenvectors of the channel 
covariance R corresponding to the largest eigenvalues of R. Formally, this is to say that the ( reduced) eigenbases Up and 
Uq should satisfy [cf Section [7^ 

C{Up) = C{Uq) = {uR^i, . . . ,UR^,.} CCiUn), (69) 

where Ur = [ur i, . . . ,Ur^Nj], and r* = rank(P*) — rank(Q'^) denotes the pilot/precoder rank at the joint optimum 
{P*,Q*) of Problem Q' 

Proof: See Appendix [H| ■ 



Since Theorem VIII.l holds irrespective of the value of the pair {p-p, IJ-q)^ it not only holds for the joint optimization without 



energy boost ( |68] l, but as well for the joint optimization problem with energy boost ((33|)/(|6T]l. 

''in the literature, the optimal balancing between pilot/data symbol powers under an overall average power constraint and for fixed time fractions assigned 
to training and data transmission, is sometimes referred to as power boost (e.g., |11|). Our setup is different: the training duration Tt is not fixed, but is 
given by the inner optimization via Tr = rank(P). The constraint for the outer optimization is not on powers, but on the sum of pilot energy fi-p and data 
symbol energy (T — Tr)fiQ. This is why we talk about energy boost. 

''obviously, the rank r* is not known a priori before solving the problem. The notation in {69) is merely to indicate that C{Up) and C{Uq) should 
contain eigenvectors of R corresponding to the large.st eigenvalues of R. 
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Consequently, and without loss of optimality, we will align the eigenbases of P and Q in conformity with ( [69) . The scalars 
[r]i = ri, [p]i = Pi, and [q]i = qi shall denote the eigenvalues of R, P, and Q, respectively. Under such assumptions, all 
matrices involved in the expression of the effective SNR ( [T8] l, namely R and R [cf (|5]l], as well as Q, acquire the same 
eigenbasis Ur. We can readily see from Expression ( fTS] ) that S then inherits the (common) eigenvectors of P and Q, i.e., 
C{Us) ^ C{Up) = C{Uq) C C{Ur), so that the profile s is given by [cf ([18])] 



^ = ^^T^ (70) 



rQq 
1 + qTf 

and '0' denotes the componentwise product. Here, the eigenvalue vectors f = r(p) and r — r{p) — r — f^p) are functions 
of p and respectively have entries 

n{p^) - — h{p^) = -4^- (71) 

1 + r,pi 1 + r^pj 

Hereinforth, we will write s{p, q) instead of s{P, Q) whenever we implicitly assume that the eigenbases are optimally 
aligned according to ( |69] l. We do not impose any ordering of the eigenvalues pi, qi, and r^. Instead we assume, without loss 
of generality, that they are arranged in such way that the Si are non-increasingly ordered. 

D. Pareto optimal allocation with energy boost 

Upon optimally aligning the eigenbases as according to Theorem VIII. 1 we now consider the remaining problem that consists 
in jointly optimizing the allocation vector pair (p, q), which belongs to a set that constrains the average power radiated by the 
transmitter array: 



r 



{(P, q) e + (T - Tr)l^q < T^l} . (72) 




By virtue of Theorem VIII. 1 we have s{VQ) = s{r). In the following, we will devise a procedure for computing the set 
of all allocations (p, q) that yield points s{p,q) located on the Pareto border d^s{VQ) = d^s{r). Given the monotonicity 
of the function s{p,q) [cf. (|24|i], we are certain that any Pareto optimal allocation {p,q) will expend the full power budget, 
and thus belong to 

d+r = {{p, q) e r\i^p + (r - Tr)i^q = Ty] . (73) 

The joint problem (|33])/(|6T]l can thus be reformulated once more as 

max /(s) (74) 

Now note that the search set 9+s(9+J^) is not equal to the set s{d^r), meaning that it is not sufficient to simply choose 
some full-power allocation (p, q) in order to obtain a Pareto optimal allocation. Instead, we have the proper inclusion 

d+s{d+r) c 3{d+r). (75) 

In fact, any Pareto optimal allocation is a full-power allocation, but the converse is not true. This becomes clear when counting 
dimensions: the vector s has real entries, so any parametrization of the feasible set s{r) with minimal number of parameters 
will require at most A'x real parameters. However, the entries of the vector pair (p, q) represent 2N-Y parameters. Even by 
replacing F with d^F, which implies the fulfillment of the linear constraint X^iPi + {T — Tr) 9i — Tji, we only lose one 
parameter, which still leaves us with 2Ni — 1 parameters. Thus, we are left with at least Nj — 1 redundant parameters that 
need to be eliminated. However, a direct elimination by working off the explicit expression of s{p,q) in ( |70] i does not seem 
possible. 
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The idea for reducing the parameter set so as to efficiently compute Pareto optimal allocations {p, q) will be as follows: we 
choose some vector norm ||-||, then fix a non-negative direction vector e > that is normalized as ||e|| — 1. This normalized 
vector points into the positive orthant of the s domain and defines a half-line departing from the origin. We then maximize the 
norm q)|| with respect to the allocation {p,q) under the constraint that s{p,q) points into the direction of e. In other 

terms, we determine the point from the set s{VQ) — s{r) which lies farthest away from the origin, and is located on the line 
running along e. 
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Fig. 5. Symbolic sketch of tlie procedure for computing Pareto border points from d'^s{'P, Q). Said points are parametrized by a unit-norm direction vector 
e 



Formally, the problem at hand can be stated as: 



max I' s.t. s{p, q) = ve (J6) 
where ly = \\s{p, q)\\ stands for the norm of s, while the function s{p, q) is given by ( fTO] ) as 

1 + r^q 1 + — r^q 

and e is some normalized direction vector pointing into the positive orthant, i.e., e > and ||e|| = 1. As usual, the search set 
r can be reduced to d^F. 

When we vary e, the set of all points I'maxe that are determined by this maximization procedure constitute what we shall 
call a front border. 




Fig. 6. Front border d^A of a closed set C ' 



The front border of a compact set A C M" shall be denoted as d^A and be formally defined as 

d^A= argmaxu. 



e>o 

e =1 



(78) 



Note that certain directions e may yield empty sets {a E A\a = ve}, so only non-trivial contributions (non-empty sets) should 
be retained when taking the union ( |78| l. As we easily intuit from comparing Figures [3] and |6j the Pareto border and front 
border of a compact set are not generally identical. However, according to the next Lemma, identity holds for the set s{r). 

Lemma VIII. 1. The Pareto border and the front border of the set s{r) coincide. 

Proof: See Appendix |l] ■ 
As a consequence, we can compute the Pareto border by the above-mentioned technique. Let us choose the norm ||-|| to 
be the 1-norm ||s||j^ ~ Si, as this will turn out to be a convenient choice. The quantity v that is maximized in ( |76] l is the 
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1 + r^q 



1-norm of the vector s{p, q), constrained to being colinear with e, i.e., 

s{p,q)^iye ||s(p,q)||;^ 
where 77 stands for [cf. ( [77] )] 

?7 = ||r q||^ = r^q. 

Note that the coHnearity constraint s{p,q) — ve impHes the colinearity r Q q — rje. Componentwise, the latter reads as 
[cf. (|71)] 



(79) 



(80) 



(81) 



1 + TiPi 

Consider e to be fixed. Then we see from ([ST} that, once 77 is given, pi and qi are entirely determined from one another: 
given any value of qi > 0, the corresponding value of Pi > is uniquely determined (as long as ^ < r^), and conversely, 
given any value of pi > 0, the value of qi > is uniquely determined. This allows us to effectuate a (one-to-one) change of 
parameters: we drop the qi and replace them by e^, thus effectively replacing the parameter pair {p, q) E F hy the new pair 
{p, e) e 2?(r/i) X From ([STJ, the qi can now be expressed in terms of pi and as 

Qi {Pi ,ei) = rjCi ^ "^Z"'^' . (82) 
rfp^ 

By summing ( |82] l up over i, and taking into account the energy conservation J2iPi + {T — TT-jJ^iQi — obtain 
expressions of rj and of qi which are functions of {p, e): 

-1 



77(p, e) 



Tfi - l^p 




Consequently, i' can itself be expressed as a function of {p, e) too [cf. (|76]l]: 

77(p,e) 



z/(p,e) = 



l + r^q{p, e)-ri(p,e)' 
We can now dismiss the initial problem formulation (|76]l in favor of the equivalent formulation 



p*{e) = argmax e) 
peX)(T/.) 



(83) 
(84) 

(85) 
(86) 



with z/(p, e) as given in ( |85| ). Once the maximizer p*{e) is determined, we compute the corresponding q*(e) via (|84]) as 

" qi{p*{e),e) 



The Pareto border d^s{VQ) — d^s{r) is described in its entirety by the union (see Figure [sjl 

d+sir)= y sip*ie),q^ie)). 



(87) 



(88) 



e>0 



Definition VIII.l. A function /: A" 1— > M is quasi-concave (resp. quasi-convex) on a convex and compact set X C M" if it 

can be represented as a concatenation 

f{x) = {goh){x) (89) 

of a concave (resp. convex) function /i: — > M and a non-decreasing function (7: M — > M. 

Lemma VIII.2. The function 1^(^5,6) is quasi-concave in p. 

Proof: See Appendix j] ■ 
This lemma renders ( 86 1 a quasi-convex problem, which can be solved efficiently. 

Figures 7(a) and 7(b)| illustrate an example of a function i/(p, e) for Nj = 2 transmit antennas, channel coherence T = 10 
and SNR /i = 1, r = [2/3 1/3] and e = [l/2 1/2] . The quasi-concavity (but non-concavity) can be well appreciated 
in said plot, since i^{p, e) appears to be convex in p near the borders of its triangular domain 'D{Tfj,), while it is concave in 
an inner region. Notwithstanding this change of curvature, the function is globally quasi-concave in p, since all upper contour 
sets, as illustrated in Figure 7(b) are convex. 
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P2 - - pi 

(a) vijp, e) as a function of pi and p2 




(b) Contour plot of the function u{p, e) from Figure |7(a)| 
Fig. 7. Three-dimensional representation and corresponding contour plot of a function i/(p, e) 



£. Pareto optimal allocation without energy boost 

The problem considered so far becomes a different one if the pilot energy l^p and the transmit power l^q are individually 
limited by fixed budgets (i.e., no energy boost is permitted). Instead of a sum energy constraint per fading block ( |3T| i that is 
shared between the tasks of channel estimation and data transmission, let us consider a pair of constraints 



1 q<fiQ 



(90) 



where the so-called budgets fi-p and iiq are two given constants. Repeating the approach taken in Subsection VIII-D we define 
a set 



f ={(p,q)eMf^ 



^^P < M-p and l^q < ■ 



and seek to maximize the 1-norm 



V 



i^^\\sip,q)\\, = 



(91) 



(92) 



over r under the colinearity constraint s{p, q) = ve, and where -q — \\r Q q\\-^ [cf. (|79|]. Similarly to ((83]l-((84]l, for a fixed 
direction e, the quantities fj and q can be expressed as functions of (p, e), namely 

, -1 



7y(p,e) = ^lQ 



1 + 



With these two functions, v as well can be expressed as a function of p: 

fjip,e) 



v{p,e) = 



l + r^q{p,e) -ri{p,e)' 



(93) 
(94) 

(95) 



Note that in constrast to (|83]l-(|84jl, in (|93|l-(|94l) the factor -^^ ^ ^ has been replaced by the constant /ig. This change makes 
the problem 



p*{e)— argmax vijp^e) 

ped+v{p.-p) 



(96) 

yet more amenable than its counterpart with energy boost ( [861 ). In fact, while ([86]l is a quasi-convex problem according to 

(97) 



Lemma VIII.2 the problem pb\ is convex and admits a closed-form solution 

i^ei(l + /igr,) 
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A detailed derivation is provided in Appendix |k| Again, as in ( |88| l, the entire Pareto border d^s{V,Q) = d'^s{r) is 
parametrized by e as 

d+s{r)= y s{p*{e),q*{e)), (98) 

e>0 
l|e|li = l 

where q*(e) = q(p*(e),e) = [9i(p*(e), e), . . . , gwT(p*(e), e)]"^. 

It is worth mentioning that for Nj — 2 transmit antennas, a simple closed-form parametrization of the Pareto border 
d^s{r) was already reported in |2, Equation (31)], as the solution to an envelope equation. Said solution can be recovered 
by particularizing ( |98| ) to A^x ~ 2 and an appropriate change of parameters. 



IX. Iterative Joint Design 



Since we know from Theorem 



VIII. 1 



that the eigenbases are aligned as C{Up) = C{Uq) C C{Ur) at the joint optimum, 
it follows that we can align the eigenbases accordingly and keep working in the vector domain alone. 

The interest of Problems ( |34j i, ( |52] i is that, once we know how to compute their solutions with respect to some utility / e J-^, 
a natural way of tackling the joint design of p and q is by alternating between the two problems in the fashion of a block 
gradient ascent: 



\Pn+l = P*iqn) 
+ l = g*(Pn+l) 



I Qn+l = q*{Pn) 
\Pn+l P*{Qn+l) 



(99) 



This procedure converges monotonically toward a fixed-point of the iteration {Pn+i,C[n+i) = {p* {Qu) , Q* {Pn)) , or enters a 
cycle. However, this simple iteration is not sufficient to reach a global optimum of the joint problem, the reason being that 



there are too many parameters contained in the pair (p, q), as already observed in Subsection VIII-D 



Therefore, we need to insert an additional step in the iteration that readjusts the allocation (p, q) so as to remain Pareto 



optimal. This step can be performed with the methods for computing the Pareto border, developed in Subsection VIII-D when 



allowing energy boost, and in Subsection VIII-E when precluding energy boost. Roughly speaking, the algorithm should cycle 
through the following three steps: 

1) Optimize p for a prescribed q 

2) Optimize q for a prescribed p 

3) Adjust (p, q) to be Pareto optimal 



A. With energy boost 

If p and q are constrained by an average power constraint such as ( (3T] i, i.e., 

l^p+iT-Tr)l^q<TfM, 

then the algorithm reads as follows. 



(100) 



Algorithm 1 Joint optimization with energy boost 

1: PO ^ ^IWt 
3: n 

4: repeat 

5: p' ^ argmaXpgp(iTp^) /(s(p, q)) 
6: q' ^ argmaXqgp(iTq^) f{s{p, q)) 

7. , s{p',q') 

'■ e„+l ^ \\s(p',q')\\^ 

8: p„+i ^ argmaxpgp(y^) v{p, e„+i) 

9: qn+i ^ q(p„+i,e„+i) 
10: s„+i e„+ii/(p„+i,e„+i) 
11: 71 -s— n + 1 
12: until /(s„) - /(s„_i) < e 



For concave (resp. quasi-concave) utilities f E Steps|5]and|6]were shown to be convex (resp. hidden convex) optimizations 
in the findings of Section VII and Subsection |VI-C[ respectively. Computation of Steps |7] through 10 have been exposed in 
detail in Subsection VIII-D wherein Step [8] was shown to be a quasi-convex optimization. 
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B. Without energy boost 



vm-E 



If p and q are constrained by two individual budget constraints l^p < fi-p and l^q < jiQ as in (|90| in Subsection 

then the algorithm looks similar to the above Algorithm [T] except for the fact that those steps that project an allocation (p„, q„) 
onto the Pareto border (Steps [s] through 10 1 make use of the barred functions v{-^ ■) and q(-, •) instead of v{-, ■) and q(-, •), 
and that the search sets of the optimizations in Steps |5] |6] and [8] as well need to be changed accordingly. Moreover, the initial 
values for po and go need to be made consistent with the constraints l^p < ^x-p and l^q < ^q. 

Algorithm 2 Joint optimization without energy boost 



n ^ 
repeat 

P' ^ argmaXpga+i5(;.T') «)) 
q' ^ argmaxqga+p(^g) /(s(p, q)) 

e„+i ^ II 

||s(p',q')lli 
Qn+l ^ q(Pn+l, e„+i) 

Sn+i ^ e„+ij/(p„+i, e„+i) 
71 -s— n + 1 

until f{Sn) - /(Sn-l) < e 



Similarly to Algorithm [T] we have that for concave (resp. quasi-concave) utilities f E Steps |5] and |6] are convex 
(resp. hidden convex) optimizations (cf. Section VII and Subsection VI-C respectively). Steps |7] through [TO] are detailed in 
Subsection IVIII-DI and can be solved in closed form. 



C. Optimal training duration 

We mentioned in the problem statement in Section |V] that we could leave aside the problem of tuning the length of the 
training sequence, since its optimization amounts to an exhaustive search over the interval {1, . . . , Nj}. Indeed, to tackle the 
full-fledged joint optimization problem stated in Equation ( (32] i, we simply need to wrap Algorithms [T] or |2] into an extra loop. 
First, we need to internalize the time penalty into the utility function by redefining / as 

fis) ^ '^^lis). (101) 

Note that the functions v{-,-) and v{-,-) are dependent on the parameter TV, and should be updated accordingly with the loop 
count. The full-fledged algorithm then reads as follows: 



Algorithm 3 Full-fledged joint optimization 

1: ^ 1 

2: repeat 

3: ... Algorithm [T] or |2] . . . 

4: (p*(rO,<7*(T,)) ^ (p„,q.„) 

5: Tr^Tr + l 

6: until Tr = Nf 

1: {p\ q*) = max, f{s{p*{i),q*{i))) 



X. Simulations 



Figure |8] shows how Algorithm [T] (with energy boost and for fixed Tr — 2) converges to the jointly optimal solution for the 
utility function f(s) — I{s). The parameters chosen in this simulation are T = 10, /z = 10 (i.e., lOdB), (iVr, A'r) = (2,2), 
and (ri,r2) = (|,|). 

Figures |9(a)| and 9(b)| respectively show the quantities [cf. (|20]l] 



/(a)(s) = ^^-^I{s) -fM = r-Nj + trE[(I + WSW^Y^] (102) 

plotted against the SNR /i (in decibels) for the same 2x2 system as for Figure [s] i.e., (ri,r2) — (|, 5) and T = 10. The 
former utility /(a) represents an achievable rate, since it is the mutual information /(s) weighted with a training overhead 
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(a) convergence of the profile s towards the optimizer, located on the (b) convergence of the utility function 7(s) towards the optimum 
Pareto border (thick black line) of the feasible set s{r) 

Fig. 8. Convergence of Algorithm^ at an SNR of lOdB (/x = 10) for an exemplary 2x2 MIMO channel, both in the s-domain [Fig. |8(a)') and in terms 
of the utility value I{s) [Fig.|8(b)| 
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ti[dB] 

(b) achievable mean-square error as a function of the SNR 



Fig. 9. Two different utility functions against the SNR parameter for an examplary 2x2 MIMO system. For both utilities, the performance of full-fiedged 
optimization is compared to a partial optimization (precoder only) and no optimization at all. 



factor [see also formulation ( [32] ) of the full-fledged optimization problem]. The other utility /(b) is the negative of the 

bound on the per-symbol MMSE, derived in ( pOj i [see also Utility 9 in Table For this utiUty, we have fixed r — Tr — 2. 
In each of the two figures, three curves are plotted for comparison: the rate obtained with full-fledged joint optimization ( |32] l 
computed with Algorithm [T] [including an exhaustive search over Tr in the case of Figure [9(a)] cf. Algorithm |3) ; the rate 
obtained in the case of precoder optimization alone; the rate obtained in the case of no optimization at all, i.e., Pq = ^^^IjVx 

and Qq = ^"^^^''^ Ia^t- The relative gains in mutual information are well noticeable especially for low and moderate SNR 
values. For higher SNR instead, these gains are far less significant. In fact, at high SNR, if the channel coherence T is at 
least twice the number Nj of available transmit antennas, the optimal pilot-precoder pair tends toward the non-optimized pair 
(fo,Qo). 

Appendix 

A. Power monotonicity 

We first prove that the function (P, Q) H> S{P, Q) is matrix-monotonic in the first argument, meaning that 

Q<P -<P' ^ S{P,Q)'<S{P',Q). (103) 

Denote 

R^{R-^ + P)-^ R' = {R-^ + P')-^ (104) 
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and 



-R — -R — -R 



R' — R R' . 



(105) 



Then P ^ P' obviously implies R > R' and R ^ R' . This further implies tr(Qi?) > tr(Qfi') and R^ -< (R')^ , from 
which finally follows 



R2QR2 



-< 



iR')-2Q{R'y. 



1 + triQR) 1 + tT{QR') 
which is nothing else than S{P,Q) -< S{P',Q). The monotonicity ( |103| l implies the weaker property 

0<k<k' ^ S{kP, Q) -< S{k'P, Q). 



Similarly we have 



This is because 



Q<k<k' ^ S{P, kQ) -< SiP, k'Q) 
S{P,kQ) 



< 



^R2QR2 

l + ktr{QR) 

^ ^R^QR^ = S{P,k'Q) 



l + k' tv{QR) 

owing to the fact that k H> - — ^ ^ is monotonically increasing in k. 
B. Examples of utilities 



TABLE I 

Examples of utilities from the class F 





Utility 


Curvature in S 


1 


I{S) 


concave 


2 


tr{S) 


linear 


3 


dct(S) 


log-concave 


4 


tr{S-^)-^ 


log-concave 


5 


det(I + uS) with 1/ > 


log-concave 


6 


Edct(l + WSW'') 


log-concave 


7 


Elogdct(H^SH^t) for Nj > A^r 


concave 


8 


Edet(WSWt) for Nj > TVr 


log-concave 


9 


-trE{(I + WSWt)-!} 


concave 


10 


trE{(S-i + W'lW)-^} for dct(S) 7^ 


concave 


11 


Pr(dct(I + VVSVi^t) > r?) 


-/- 


12 


Pr(logdct(WSW't) > y) for Nj > TVr 


-/- 


13 


Pr(dct(Vi^SWt) > ^) for Nj > A^r 


-/- 


14 


Pr(-tr{(I + VVSVt^t)-!} > r?) 


-/- 


15 


Pr(tr{(5-1 + H^tvV)-!} > rj) 


-/- 


16 


IIsIIf 


convex 


17 


Amux(5') 


convex 



(106) 

(107) 
(108) 

(109) 
(110) 



In the following, we provide a few examples illustrating from which bounds or approximations of the mutual information 
I{S) the above utilities may arise. 

a) Utility 2: A simple upper bound on I{S) is obtained using the fact that E[V(^5Vt^^] — ti{S)NjI and by applying 
Jensen's inequality to the concave log-determinant: 



I{S)<Y,^og{l + tr{S)Nj). 



(Ill) 



i=l 



b) Utility 5 with v = TVj-A^r; Using the determinant identity dct(I + AS) = dct(I + Byl) to write I{S) = Elogdct(I- 
SW^W), and applying Jensen's inequality, we get another upper bound: 



I{S) < log det (I + NjNj.S) . 



(112) 



Here, we have used E[W^t^] = NjNr. 
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c) Utilities 6 and 11: By applying Jensen's inequality to the concave log function, we get the upper bound 

I{S)<\og'R[dei{l + WSW^)]. (113) 

d) Utilities 3, 7 and 12: We can lower bound I{S) by removing the identity matrix inside the log-determinant. Depending 
on the sizes of antenna arrays, this gives us a bound I{S) > I{S) with 

iElogdct{W SW^) for Nj>N^ 
\Elogdct{SW''W) for Nj<Nr, 

The former case (i.e., Nj > TYr) justifies utilities 7 and 12. In the latter case (i.e., Nj < A'r), note that 

/(S') = log dot ( 5) + E log det ( VF) (115) 



leads to utility 3. Clearly, 1(5') is good as an approximation of I{S) at high SNR, and was used as such in |18|, 1 19 1. Let us 
also mention the tighter lower bound |20J 

I{S)>NR\og(l + exp(^^^I{S))), (116) 



the derivation of which makes use of the Minkowski inequality for determinants. 
C. Proof of Lemma \VI.1\ 

Supposing we are in the first situation, i.e., A has full column rank, then A has a left pseudoinverse A' = {A^ A)^^ A^ 
which can be used to define the inverse function g^^. Let Z = g{X) be the image of X. Given Z, one obtains X by 
insulating it via left-multiplication with A^ and right-multiplication with A^\ and appropriate scaling: 

A^ZA*\l + tr{BX)) = X (117) 

Left-multiplying ( |117| l with B and taking the trace yields 



or equivalently, 



By combining ( |117| l with ( |119| l, we recover the pre-image X = g^^{Z), and see that the inverse function g~^ is linear 
fractional with parameters A^ and —A^^BAK 

We now suppose that we are in the second situation, i.e., A has full row rank and X is such that rangc(X) = rangc(A^) 
for every element of X. Due to the latter constraint on the span of X, we can write any X G X sls X = A^XA, with X 
given by the inverse relation X — A^^ XA^, where A^ = A\AA'^)~^ denotes the right pseudoinverse of A. The function g 
can be represented as 

^ AXAt AXA^ 

^ l+tr(iJX) = i+tr(BX) ^'^'^ 

with abbreviations AA^ = A and B = ABA^ . Since A = AA^ has full rank (because A has full row rank), the function g 
appears as an injective linear fractional function of X with parameters A and B, whose inverse, according to findings above, 
is linear fractional with parameters and --A^^BAK Denoting as Z = giX) the image of X under the function g, we can 
thus recover the pre-image X from Z as 

X = A^XA = At . A 

1 - tr(A«tBAttZ) 

A^ZAH ^^^^^ 



1 -tr(A^tBA^Z)' 

Consequently, the inverse is linear fractional with parameters and —A'^BA 
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D. Proof of Theorem |V7.i| 

Let P and Q have rank(P) = rp and rank(Q) = tq respectively, with rp,rQ E {1, • • • ,^t}- The absolute difference 
of ranks be = \rp — tqI. The pilot matrix and precoder have reduced spectral decompositions P — UpApUp and 
Q = UqAqU^q, respectively, where the eigenbases Up G U^^^^''^ and Uq € U^'-^''Q are tall or square, whereas Ap and 
Aq are diagonal and positive definite. Let Up± G u^jxi^T-rp) ^^^^ Uq± G V^^^'^^'r^^Q^ denote orthonormal bases of the 
nullspaces of P and Q, respectively, so that UpUp± = and UqUq± = 0. 

The reduced eigendecomposition of R is consistently denoted as Uj^ApU^^, where G M!^^^'^^ is diagonal positive 
definite and of size rp x rp, due to the rank equality (|7]i which states that rank(il) = rank(P). The orthonormal nullspace of 
R is denoted as U^± G U^'^^'^^'^^'^^-'. We introduce the notation Lahb to designate an orthonormal basis of the intersection 
of range spaces range(A) and range(B). If it exists, L^nB is a matrix with the maximum number of columns defined as 

L^L = I, 

yx^O: ALx ^ and BLx ^ 



(122) 



Assume that range(Q) ^ range(-R), so the matrix L^^f^^^ is defined and has at least one column. We define a new precoder 
Q' G Q as 

Q' = Q-K^{Q)LQnk^I^Qr.H^^ (123) 

where Xtq^Q) is the smallest non-zero eigenvalue of Q. First, we verify that Q' G Q. Clearly, since XrQ{Q) > 0, we have 
Q' di Q, and therefore tr(Q') < tr(Q). What remains to prove is that Q' >z 0. The smallest eigenvalue of Q' is 

Amin(Q') = min w^Q'w. 

\\w\\—l 

But since by definition of Lq^p, the range space of Q contains the range space of -^qhr-^qpr' ^^^^ that w^Q'w is 
equal to UqQ' Uqw, where JTq = Uq{UqUq)~^Uq is the projector from C^txWt q^Ao the basis Uq. We thus have 



> \rQ{Q){l - Amax(-n'Q-n'Q)Amax(-C'Q^^_L-^QnR-L 



= 0. (124) 

The second inequality holds because the spectral radius norm A,nax(') is sub-multiplicative, while the last equality holds because 
the projector Uq and the (sub)unitary ig^^i have a largest singular value of at most 1. We infer that Q' G Q. 

Notice that Q' is purposely constructed so that Q'R = QR. As compared to the matrix S = S{P, Q) obtained with the 
precoder Q, the new matrix S' = S{P,Q') thus reads as 

^, Ri-Q'R'2 
1 + triQ'R) 

R^QR^ 



1 + tiiQR) - Xr^iQ)tT{L^^^^^RLq^^^) 

kS (125) 



and thus turns out to be a scaled version of S, where the positive scalar k is 

1 + tr(QH) 



triQR) - XrJQ)tT{L^^^^^RLq^^, 



> 1. 



Therefore, we have S' >- S, so the precoder Q is necessarily suboptimal, which means that rangc((5) ^ range(ii) cannot 
hold for optimal Q. Instead, we must have range(Q) C rangc(ii) for optimality, which concludes the proof. 

E. Proof of Lemma \VI.2\ 



As a consequence of Theorem VI. 1 and of the power monotonicity ( |24| i, optimal precoders Q will be elements of d^QD 
range(ii). By definition, this set can be parametrized by rp non-negative coefficients i/^ = [ipi, . . . j^/j^^]^ G M'j_^ stored in a 
diagonal matrix \[' = diag(i/j), and a tall or square (sub)unitary basis T G U^'^^'"^ as follows: 

Q^r = (fi5)+r«P'r^(J?3) + , (126) 
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where (•)+ denotes the Moore-Penrose pseudoinverse (cf. Section |n]), and where the parameter pak (tf^,T) shall be subject 
to the four constraints 

^p>0, (127a) 

tr(Q^>,r) = MS, (127b) 

r^r = I, (127c) 

range(T) = range(^). (127d) 

The first two constraints ensure that Q^.t belongs to while the structure of Expression ( |126| l ensures that Q^.t belongs 

to range(Ji). The third and fourth constraints ( |127c[ )-( fl27d| l are clearly not necessary to fulfill Q^ t G range(ii) n 5+2, 
but they induce no loss of generality either and will turn out helpful later. The set 9+Q is thus entirely parametrized by the 
parameter pair (tf'jT) subject to the constraints ( |127| l. Consider now the feasible vectors 



siP,Q9x) = ^ 



l + tT{Q^,rR) ) 



(128) 



\^ l+tviQ^^rR) J 

This vector has at most rp non-zero entries because tf' is rp x rp. Therefore, we define a vector s £ of reduced 
dimension, which contains the rp topmost (i.e., largest) entries of s. Since range(T) = range(il) [cf. ( |127d| i], the matrix 
U^-Y is unitary, so we have 



^ 

lb 

(129) 



iY{Qq,,TR) 



1 + tr(rt(fi3)+fi(^5)+r«p') 

Note that we have not assumed so far that the entries of ■0 or s are sorted in any specific order. For notational brevity, call a 
the vector of entries — {R^^^ R(yR^^^ T\ ^ ^, then 

5(P,Q^,r) = — (130) 

On the other hand, the second constraint ( |127b| i translates to 0^xjj — /ig, where (3 denotes the vector of diagonal entries of 
T^^+T, i.e., j3i — [T^ii+T]i i. Together, this constraint and equation ( |130| l describe an affine plane of dimension rp — 1, 
because left-multiplying ( |130| l with ^/3^ + leads to the affine equation 

1 

— 13 + a.) s(P,Q^t) = 1. (131) 

This affine equation, together with the non-negativity constraint ip > [cf. ( |127a| i], thus delimit a (rp — l)-dimensional 
simplex, whose elements fulfill 

' y^ = i 

, (132) 



s> 



where 



1 

I-IQ I 



(133) 



The rp vertices of the simplex described by ( |132| l are the axis points w^e,. 

Due to the symmetry property of utilities from the class the ordering of the Si does not influence the utility value. Assume 
that, for a given s > fulfilling ( |132| i, there exists an index permutation tt such that < 1, then s is suboptimal, since 

there exists an s' — kUs with fc > 1 which also fulfills ( |132[ ) and yields a larger utility value f{s') — f{klls) ~ f{ks) > 
f{s). Therefore, we can discard all s for which some permutation n yields < 1. This is equivalent to the requirement 

that the Si and w,; be ordered in the same way, i.e., 

LJi < ujj Si < Sj. (134) 
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Hence, without loss of optimality, we will restrain the set of admissible s to the following convex set, called S: 



S= (se 



rp _ 
\ ^ Si 

2 — 1 



— = l,Vj: Sj > Sj+i \ , (135) 



where lj = [uji, . . . , w^p]^ contains the entries of uj arranged in non-increasing order, i.e., > . . . > uJrp- Let us define rp 
special points pertaining to S, which we shall denote as cr'"^ and define as 

n 

Vn = l,...,rp: ^r^") = ^(wi, . . . , lS„) ^ , (136) 



where ...,•) and are defined in the statement of Lemma VI. 2 In fact, it is easy to see that the cr*^") have non-increasing 



^3 

entries and fulfill y^'^^, = 1, and thus belong to S. Now, we will show that the set S' of all convex combinations of the 



o-("), I.e, 



,n=l 



^i/„ = LVn: i.„>oL (137) 



is the same as the set S. We know that S' is a subset of the convex set S, for being a convex combination of a collection of 
points from S, hence S' C S. Next, we argue that, if we assume that some particular point a belongs to 5 \ S', this implies 
that (T does not lie in S because it would fail to comply with some constraint from the definition ( |135| l of S. Therefrom, it 
will follow that S = S' . 

Since the cr*^"-' pertain to S and are rp linearly independent vectors, they define the {rp — 1) -dimensional affine plane 
described by J2l=i q~ ~ Therefore, to prove the equality S = S', it will be sufficient to take some point a- — [ai, . . . , CTr^]^ 
to lie on said plane, and show that an infringement of an inequality ai > at+i implies that a = X]n=i '^n"''"'' with coefficients 
such that either ;>„ 7^ 1 or i>„ < for some index n. So, assume that tr, < (7^+1 for a given i. There exist unique 
coefficients j>„ such that cr = X)L=i ^n"'^"''- The inequality ai < di+i can thus be written as 

X;^«eT^(")<g;^„eT^i^("). (138) 

n— 1 n— 1 

By inserting ( |136[ ) into the latter inequality, we get 

rp rp 

^Dn'H{uil,...,Uln) < ^ l'n'H{u)l,---,U)n), (139) 

n—i n—i+1 

which boils down to Vi < 0. This concludes the proof that S = S'. Also, this simplex S contains only Pareto border points, 
in the sense that S ~ d^S. In fact, any point s" dominating some point s' £ S would fulfill J^lZi ^'l l^i > 1 thus lie 
outside S. 

Now that we have fully characterized the set of Pareto border points s = s{P, Q^x) ^ fixed T as a simplex set S, we 
ask what the best choice for T is under the constraints ( |127c| i-( fl27d| i. Clearly, if there exists one single T* that simultaneously 
maximizes all vertices cr(") in the sense that for any T, we have 

o-(")(r*) > cr(")(r), n = l,...,rp (140) 

then this Y* is optimal. Here, cr(")(T) denotes the value of cr^"), as defined in ( |136| l, with the interpreted as functions of 
Y. Next, we show that such Y* is well-defined and characterize it. 
We state the multiobjective optimization problem 

Vn = 1, . . . , rj3 : max . . . , w^). (141) 

range(T)=range(H) 

Omitting the range space constraint on T, we have that, with the definition ( |133| l of the coefficients together with the 
definitions of ai and Pi, this multiobjective problem reads as 



n 

Vn: min V [rt (^5) + + ^) (Rl)+r 

i—1 



(142) 

7r(z) ,7r(i) 



where tt denotes the permutation which orders the diagonal entries of the matrix between square brackets so as to be non- 
decreasingly ordered. If WDW^ denotes the spectral decomposition of (f?3)^(^+fi) where W G U^Txr^^ ^j^^ 

has non-increasingly ordered, positive diagonal entries, then it is well known from majorization theory that the solution of ( |142| l 
is Y* ~ W, up to a column permutation (e.g., |21 Theorem 4.3.26]). It turns out as well that range(T*) — range(VF) = 
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range(ii), so the range space constraint is systematically fulfilled. The columns of T* contain the eigenvectors Vi of the 
generalized eigenvalue problem 



Rvi 

corresponding to the rp largest generalized eigenvalues uji 



(143) 



F. Proof of Theorem |V7/.i| 

Assume that rp > tq. Similarly as for the proof of Theorem VI. 1 in Appendix [P] we will proceed by constructing another 
pilot matrix in V which strictly outperforms P. Recall that the covariance of the channel estimate is il = il — U, as usual, 
and R = {R~^ + P)^^ is the estimation error covariance. We construct P' as 

n -1 



P' 



The subunitary matrix Lq^^^^ is defined as in the proof of Theorem 
First, we verify that P' e T-". In fact, P' can be written out as 



VI. 1 



R\ (144) 
It exists and has at least d = rp ~ tq columns. 



P' = 



R- R + XrpiR)L^ 



-t 



Q^nR^Q±nR 



R 



(145) 



where it becomes clear that P' >- 0, because R — \r„{R)L^^^AL' , ~ > 0. On the other hand, the trace of P' is 

— <p\ I Q^nu Q^nR — 

upper-bounded as 



tr(P') = tr ( R + Xr, {R)L^^^f,L^^^^^ 



tr(p-i) 



= tr(P). (146) 

If we write R — R{P) to stress that it is essentially a function of P, then we notice that P' is designed so as to leave the 
product 

QR{P')^Q{R + \rAR)LQ^nRl^^Q^r.R) 

= QR{P) (147) 

unchanged, irrespective of whether the pilots are P or P'. The same is true for s ~ A(5'), which is left unchanged when 
replacing P by P', because s depends on P only via the product QR{P), as seen from the relationship 

^ ^ XjRiQR'^) ^ \{QR-QR) 
l + iv{QR) l+tr(QP) 

We have thus constructed alternative pilots P' which yield the same utility value f{s), yet saving on the training energy, since 
tr(P') < tr(P). We generate another pilot matrix P" = nP' with n = tr(P)/tr(P'). The new pilots P" spend the same 
amount of training energy as P, but yield a strictly larger S" = S{P" , Q) >~ S{P, Q). Hence, P is suboptimal. 

G. Convexity of the set of feasible R 

Showing the convexity of the set of feasible R is equivalent to showing the convexity of the set of feasible R, because 
R = R — R is merely R scaled with —1 and summed with a constant matrix R. Therefore, we show that the set 

{R={R-^ + P)-'^\PeV} (149) 

is convex. For any pair (Pi, P2) G V^, there exists a P3 e and a a e [0; 1] such that 

aRi + {l-a)R2^ R3, (150) 

where P,; — {R^^ + Pi)^ for i — 1,2, 3. By isolating P3 in ( |150| l, the pilot Gram P3 is given by 

P3^[aRi + il-a)R2f'-R-\ (151) 

Obviously, since Ri :< R for i = 1,2, we have P3 ^ 0. What remains to prove is that tr(P3) < ^i-p. Knowing that the 
function X i-> tr(X^^) is convex on the positive cone X ;^ 0, we have 

tr(P3) < atr{R^^) + (1 - a)tr(^^^) - tr(P"i) 
= atr(Pi) + (l-a) tr(P2) 

< MP- (152) 
Hence, the set of feasible R is convex, and so is Problem 
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H. Proof of Theorem |V///.7| 

We will proceed by showing that, in Problem ( |68| ), for any given value of the pair (/ip, /ig), the search set s{V , Q) — and thus 
its Pareto border d^s{V, Q) — is left unchanged whether we allow (P, Q) to take any value within x Q, or whether we restrict 
the choice of the basis Up such that C{Up) = {ur i, . . . , ur p* }, where r* denotes the number of non-zero entries of the s* . 



With a consequence of Theorem VI. 2 we will eventually conclude on the desired result C{Up) = C{Uq) = {mr.i, . . . , URr*}- 
To begin with, note that the set s{V, Q) can be represented as the union 

s(^,Q)= U (153) 
Pev 

As a consequence of the rank equality (|7|, the elements of s{P, Q) have at most rp non-zero entries, because rank(S) = 
iank{Ri QRi) < rank(fi) = rank(P ) = r p. They can thus be written as s{P, Q) = [s(P, Q)^ 0^]'^ with s(P, Q) e 



of reduced size. According to Theorem VI. 2 this set of reduced-size vectors s(P, Q) is the simplex given by the convex hull 
of the points 



niu}i,...,uj„)J2ej, ne {!,..., rf>}. (154) 

Every such simplex is entirely described by u; = [wi, . . . , w^p]^, the vector non-increasingly ordered eigenvalues of the 
matrix P(/Xg^I + P) , which is a function of P alone (not of Q). Consistently with the notation used so far, u){V) shall 
denote the set of feasible u) given that P belongs to V. To prove Theorem |VIII. 1 we will first show that the set of Pareto 



border points d^ujCP) is still achievable under the restriction C{Up) C {ur^i, . . . , uji^r"}- Recalling that R — R + R and 
R = {R^^ + P)^^, we write out u) as 



P-(P-1+P)-1)(^I+(P 



Let us denote P' = R^PRI^ then using the property X{AB) = \{BA), the last expression can be rewritten as 

u; = A (^(l - (I + P')-') ((mq-R)-' + (I + P')-') " 
Let us now denote P" = U^j^P'Ur, so that the last expression becomes 

u^ = \((\-{\+p")-'){{^^Q^Rr'^{l+p"r') ']■ (155) 



Let us write out the mutual relations Unking P and P" in full: 

P " = ^rUIiUp Amg{p)ulURA\^ (156a) 
P - UrA-^^Up„ diag(p")(7l,„A^^J7]j. (156b) 

Regarding the (non-reduced) eigendecomposition P" ~ Up" Ap"Up,, with C/p" G U^^^^^ and Ap" = diag(p") = 
diag(p", . . . one can say that, if P is drawn from V, then the corresponding eigenvalue profile p" = A(P") = 

A(AJj[/jjPC/iiyljj) = X{PR) [cf. ( |156a| i] is drawn from a feasible set which we shall call p"{V), a notation which 
emphasizes its direct dependence on the domain T'. As to the eigenbasis C/p", it obviously belongs to U^txATj definition, 
yet in general, we must presume that not all pairs (p", Upn ) £ p"{V) x u^'^ ^^'f are jointly feasible, since the eigenbasis Up" 
and the eigenvalues p" cannot be chosen independently of each other, due to the special structure of Expression ( |156a| i. Instead, 
U'^ belongs to a feasible set Upn{p") C \]Njy.Nj (^hich depends on p"), so the overall set of feasible pairs (p" , Upn ) forms 
a subset of the Caitesian product p" {V) x U^txA't^ 

However, suppose for a while that p" and Up" can be drawn independently of each other from their respective domains 
p"{V) and U^TxAfj fj^jg assumption then corresponds to a relaxation of the original problem, as it possibly extends the 
overall set of feasible P", and consequently, of feasible u>. The resulting set of achievable vectors u) under this relaxation 
shall be denoted u}{'P) ^ ^{V) and is formally defined as 

u:{V) = {u,{p\Up,.)\{p\Up.,) e p"{V) X U^^>^^^} , (157) 

wherein the two-argument notation is defined as [cf. ( |156b| i] 

u{p", Up.) ^ oj{URA-^^Upn diag(p")f/],„ (158) 
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U io{p",Up.) 

[/p/,eo"TX"T 

^NtxNt\ 



The set u>{V) can be represented as a double union 

p"ep"{V) 

= U a;(p",U^^^^^). (159) 

p"ep"{V) 

As seen from expression ( |155| l, Up") is monotonic in the eigenvalues p", meaning that 

Vd > 0: Lj{p" + d,Up») > u){p",Up'>). 

Hence, since we are essentially interested in the Pareto border d^Lj{'P) of the set (^{V), we can restrict our further analysis 
to the se{3 



p"ed+p"{v) 



(160) 



The remainder of the proof of Theorem VIII. 1 is completed in four successive steps, each of which is detailed in a separate 



paragraph, for the sake of a clearer structure: first, we specify a method for constructing a particular Pareto border point of 
the set uj(p", U^^^^^) given a particular value of the vector p", where we show that this construction requires the alignment 
C{Up) C C{Ur); second, we show that the point constructed this way, besides yielding a Pareto border point of the relaxed set 
a;('P), is also contained in the smaller (non-relaxed) set (^{V), so it must be a Pareto border point of ^{'P) as well; third, we 
show that, by varying the eigenvalues p" over the feasible set p"{V), with the aforementioned method of constructing particular 
Pareto border points, we reach the whole Pareto border of U3{V)\ fourth, we show that the alignment C{Up) C C{Ur) implies 
that Uq must as well be aligned such that C{Uq) — C{Up) to reach the whole feasible set 3(1^, Q), and conclude. 

1) Let the orthonormal eigenbasis Up" = [ui, . . . , mtvt] be spanned by unit vectors Ui, where the i-th vector Ui is associated 
to the i-th largest eigenvalue p". Given a fixed value of p" € p"i'P), we construct a particular point of the Pareto border 
9+a;(p", U^^^^^) by solving the sequence of optimization problems; 

Nj}: C/W = argmax Wi(p",J7) 

-1}: a;^ =w,(p",J7(*-i)). (161) 

Clearly, J7(^t) ^^jjj yield a Pareto optimal point, that is, 

^(p"^jj(JVt)) ^ a+a;(p",U^^^^^). (162) 
I. For this purpose, let us expUcitly solve the first problem (i = 1) of ( |161| l. 



Vze{l,, 
s.t. e {1, . . 



Next, we will show by induction that U'-^' 
i.e., 



U^^'^ = argmax LUi(jp",U) 



(163) 



With Expression ( |155[ ), this reads as 



max max 

t/eU^TXNx II III 11=1 



< max 

[7eu-"TX«T 

1 



vl{l-{I + UAp„W)-^)vi 
vldfiQAn)-^ + (I + UAp„W)-^)vi_ 
l-X^i,{{I + UAp„W)-^) 



Amm((A'Q^fi) 
1 

l + KvaiAp,,) 



X^^{{I + UAp„W) 



i^J■QK■ixiAR))- 



l + A,„„(Ap„) 



1 - 



I+P" 



i+p" 



This upper bound is tight and achieved if and only if Vi = ei, and when U is of the form 

[/(I) = 





1^(1) 



(164) 



(165) 



^Note that is generally not the Pareto border of uiiV), but rather a superset thereof. 
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with some arbitrary VF^^-' e HJ'^^'^ i)x(AfT i) prove the induction step, we will show that if for a certain i > 1, all 
maximizers U^^^ are of the form 





w(^) 



(166) 



with some arbitrary VF**' G ]^(NT-i}x{Nj-i) ^ and\f£ ^ 1, . . . ,i: Vi ^ e^, then (7^*+^^ also has the above block structure ( |166| l, 
with an identity matrix I^+i top left and an arbitrary rotation matrix W^^^^^^ bottom right. After solving the i-th problem, we 
know that all solutions thereof are of the form ( |166| l, which implies that the equality constraints for the {i + l)-th problem 
[cf. ( |161[ )] can only be fulfilled if C/^^+i) has the same structure as U^'^\ i.e.. 



[/(»+!) 



I, 




(167) 



with some unitary matrix W^(*) £ u(^t j)x(AfT «) determined. According to a straightforward adaptation of the Courant- 
Fisher Theorem Theorem 4.2.11], the non-increasingly ordered eigenvalues Xi{AB ^) with corresponding eigenvectors 
Vi of a product of two Hermitian matrices A and can be expressed as 



The (« + l)-th optimization problem reads as 



argmax < ma; 

J7eU-"TX"T I t'i+i-L-Bt'i 



max . 

v±Bvi_i,...,Bvi V'Bv 



s.t. U 



h 




with A = I-{I + UAp.W)-^ and B ^ {fiQAR)-^ + (I + UAp.W)-^ [cf. ( fT55| )], and = 1, 
writing out B, the vectors involved in the orthogonality constraints Vi _L Bei_i, . . . ,Bei read as 



(168) 
(169) 

,i: V( — Be. When 



Bei = 



iI + UAp„U^)-^ 



1 



i+A%r 







1 



1 



Pe 



w<^^'>{i+A%y\w 



(170) 



[»1 



J,,, — diag(p", . . . and A^p,, = diag(p"_^]^, . . . ,p'l^^)- Thus, the orthogonality constraints simply translate into 



where A^^ 

Vj+i J- Ei, . . . , El. In other terms, the first i entries of Vi^i must be zero. Thus, we can define matrices {Nj ~ i) x (N-y — i) 
matrices A and B as 



W^''>A%,{W 



B = ifiQA^^r' + {I + W^^^A%,{W 



so that the optimization problem ( |169| l boils down to solving 



argmax 



(171) 



(172) 



This problem is fully equivalent in structure to the first optimization problem (i — 1) as written out in Equation ( |164[ ) and has 
the same solution, i.e., [cf. ( |165| i] 



1 




Consequently, C/^*"'"^^ has indeed the structure ( |166| l, which concludes the induction proof. We infer that [7^^^^ 



(173) 

I, and thus 
(174) 



Thus, we have specified a method to construct specific Pareto optimal points of the inner union in ( |159[ ) 
2) Recalhng how P" is obtained from P E V, namely [cf. ( |156a| l] 



P" = AlUlUpdiagip)Ul,URAl, 



29 



we can leverage Theorem VI.2 (although with other variables) to characterize the set p"{V) of vectors of feasible, non- 
increasingly sorted eigenvalues of the above matrix. First note that P" has the same eigenvalues as UrP"U\i, so that the set 
p" {V) may be defined as [compare with (|49|] 



p"{V) = |a (j?5Pi?5 



tr(P)<Aip} 



(175) 



Now, Theorem VI.2 can be applied upon replacing R, R, Q, Q and /ig (as they appear in the formulation of said theorem) with 



-R, 0, P, V and fi-p respectively. This leads to p'^V) being characterized as the convex hull of the points cr'" ' ,n 
defined as 



0, 







1, 



,iVT 



(176) 



It can be readily verified that all points of this convex hull can be reached when setting C{Up) C C{Uji). When doing so, 
the eigenbasis of P" is precisely Up" ~ I. But remember that the choice Up" — I was required in the previous paragraph 
for constructing a Pareto optimal point of d^uj{p" ,1]^''^^''). Consequently, this Pareto optimal point is also contained in the 
subset u>{p" , Up"{p")) C a;(p", U^''^^''), and is thus necessarily a Pareto optimal point of u}{p" , Up"{p")), too. 

3) We now ask whether all points of the overall Pareto border d^uj{V) are attained by the construction method specified 
above, i.e., whether 

d+u{V)^ U (177) 
p"ed+p"{V) 

Let us write out a; {jp" , I) by means of \\55\ as 

i7a;(p", I) = p" ((A^gr)- V + |)-\ 

where II e P^^ is a sorting permutation, '0' denotes componentwise multiplication, r^^ denotes the vector of entries r,^ 
(i.e., componentwise reciprocal), and diag(^) = S = 1+ (/igyljj)"^. The mapping p" ^ p" Q ((mq'")^^p" + ^) is clearly 
injective, since ^ > 0. Additionally, it has the property that for any real unit-norm vector e > 0, there exists a scalar e > 
and a single feasible vector p" e d^p"{V) such that 

P"0 ((MQr-)-V' + |)"' -ee. (179) 

To see this, we first rewrite Expression ( |179[ ) as 

p" = ee ^ (1 - ee (/^gr)-i)"\ (180) 

Since p" > 0, the scalar e must lie in the semi-open interval e e [0; min^ /xgr^/ej- From taking the Euclidian norm of 
Expression ( |180| l, we obtain a function e i-^ ||p"||2 which bijectively maps [0; min^ /igri/ei[ onto R^. Since any p" G d^p"{'P) 
has finite norm, there must necessarily exist one single value of e fulfilling 



(178) 

1 



ee0|0 (l-ee0(^gr)-i) ^ed+p"{V). 



(181) 



Consequently, all Pareto optimal points d^uj{V) can be reached by the construction method from paragraphs 2) and 3), so 
that we may write 



d+Lj{V)^Lj{d+p"{V)j) 



(182) 



4) Now that we have established that the Pareto border d^u3{V) can be reached by setting C{Up) C C{Ur), we have that 
R = {R^^ + P)^^ and R = R R acquire the same eigenbasis, up to a column permutation. Specifically, we have that 
the alignment C{Up) C C{Ur) implies C{Up) 
C{Ujj) C C{Uj^) leads to [cf (|5T])] 



C(C/^) C C(Ufi). But as a consequence of Theorem VI.2 the alignment 



C{Uq) C C(J7^). 

Hence, we obtain C{Uq) C C{Up) C C{Ur). Since we know from Section 
optimum {P*,Q*), we get the desired alignment property 



VIII-B 



(183) 

that rank(P*) — rank((5*) at any joint 

(184) 



C{Up) = C{Uq)cC{Ur). 

Obviously, in case ( |184| i is a strict inclusion, the eigenbases of P and Q should contain the eigenvectors of R associated to 
the largest eigenvalues of R, hence 



C{Up)=C{UQ)={UR,l,...,UR^r^}CC{UR), 



(185) 



which concludes the proof of Theorem VIII. 1[ 
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/. Proof of Lemma |V7//.i| 

For any set A C M" , the Pareto border d'^A is a subset of the front border d^A. In fact, if it were not so, then there would 
exist a Pareto optimal point, say a' e d^A, which would not be the solution to 



max V (186) 

aeA 



a—va 



in that another a" e A colinear with a' would exist that would have larger norm, i.e., ||a"|| > ||a'||. Yet this is impossible by 
the definition of d^A, because a" would dominate a' in the sense a" > a', hence the contradiction. 

It thus suffices to prove that d^s{r) C d^s{r) in order to conclude on set equality d^s{r) — d^s{r). For this purpose, 
take s' to be some point of the front border d^s{r). Assume that there would exist another point s" e s{r) different from 
s' that dominates s', that is, s" > s'. For belonging to the set s{r), which is the union 

sin^ y \j s{p,Q), (187) 

(mp,mq) Per 

^'P + {T^T^)tj^Q<Tii 

the point s" would be contained in at least one of the sets s{P, Q). Call P" E V a pilot Gram of rank rpn such that s" lies 



in s{P" , Q). According to Theorem VI. 2 the set s(P" , Q) is a simplex consisting of all convex combinations of rp// + 1 
points cr("), n = 0, . . . , rp", with o-C'^^O and [cf. ^] 



cr(") =H(L^i,...,c^„)^e„ ne {l,...,rp.}, (188) 

where uJi are the non-increasingly ordered eigenvalues of the generalized eigenvalue problem [cf. (|48]l] 

R"v,=uj,{fiQ^I + R")v, (189) 

with R" = (il^i + P")"i and R" = R R". Notice that the linearly independent vectors cr("),n = 1, . . . ,rp", when 
Unearly combined with non-negative coefficients, span the linear subspace of R^'^ of vectors having non-increasingly sorted 
entries on positions 1 through rp", and zero entries on positions rp" + 1 through Nj. Consequently, both s' and s", which 
by definition have non-increasing non-negative entries, can be written as linear combinations 



n— 1 n— 1 

with unique non-negative coefficients i/'^ and v'^. Since s" e s{P" , Q), the coefficients j/^' sum up to J^n — 1- Now, since 
s' and s" are distinct, and s' < s" by assumption, we must have 

E-«<E-«<i- (191) 

n— 1 n— 1 

Therefore s' lies in the interior of s{P" , Q). Consequently, for a small enough e > 0, the point (1 + e)s' is element of 
s{P" , Q), and thus of s{r), which contradicts the initial assumption that s' e d^s{r). Hence d^s{r) = d^s{r). 



J. Proof of Lemma |V7//.2| 

Clearly, maximizing v{p, e) as defined in ( |85| l is equivalent to minimizing the function 



v{p, e) 'i]{p, e) 

where contrary to i'{p,e), the direction vector e is omitted in the notation of the function Writing the latter function 

out in full with help of definitions ( |83| l and ( |84] i yields 

v{p) = - 



T -Tr (m-^ 1 + r,pi \ , 1 + rip. 



^ {T-Tr){Pi{p) + i>2{p))+Hp), (193) 
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the three functions i'i(p), 1^2 (p), 1^3 {p) in the last line being 



Mp) 

V3{P) 



i 

3 

2^' — 



1 



1 



(194a) 
(194b) 

(194c) 



We will now show that the three functions ^'i(p), 1^2 (p) and V3,{p) are all convex functions of p on the interior of 'D{Tfi) C 
(0; 00)^'^, which we shall denote as int(2?(r/i)). It is easy to see that 1/3 is essentially a linear combination (plus a constant) 
of functions 1/pi that are convex on the entire open orthant (0,00)^^, and thus on int(I?(T/i)) C (0;oo)^''. Similarly, i/2 is 
convex on the open half-space J^iPi < ™d thus on the subset int(X'(T/x)) thereof. Finally, i>i is a linear combination 
of functions ^ r^i-^ p ' ^^'-^ of which is convex in p on int('D(r/i)). This can be shown as follows: take a pair of points 
(^(1)^^(2)) e'int(X>(T/i))2, then for any 9 e [0; 1], 



9p) 



(1) 



1 



d)pf^ Tti-Y.A(^pT' + {i-e)p. 



1 

(iT 



.(2) 



< 



(1) 



+ (1-^)- 



(2) 



because the left-hand side of the latter inequality is convex in e [0; 1], since it is of the form 

1 1 



A- 



i + B6\ + ce 



(195) 



(196) 



with constants A = 



>0, B ^ 



(1) (2) 

b) -p) 



c = 



22i(Pi -Pi ) 



The convexity of ( |196| l is best seen by differentiating twice: 



d6i2 



1 



1 



i + B9i + ce 



Ti^-T.iP\ 



B^ 



(2) 



and 1 + 56* > and 1 + C6' > by construction. 



BC 



{1 + Be){l + CO) \{l + B9y^ (1 + C0)2 {1 + B9){1 + C9) 



(197) 



The above expression is obviously positive if BC > 0. Otherwise, if BC < 0, then the expression between square brackets 
on the right-hand side of the last equality is lower bounded by 

2 



B' 



B 



(1 



+ 



C 



(1 + C9) 



> 0. 



(198) 



C2 2BC 
(1 + 561)2 + JYTCW ^ {l + B9){l + C9) 

Hence ( |195| l, and all the three functions 1)1, i>2 and 1)3 are convex in p on the open set int(2?(T/x)). Thus, £'{p) is convex 
on int(2?(r/i)). Therefore v{p,e) — \/{i'{p) — 1), which is a decreasing function of v{p) > 1, is quasi-concave in p on 



int(2?(T/i)), according to Definition VIII. 1 Since i^{p, e) vanishes on the boundary of 'D{T^) and is continuous in the vicinity 



of this boundary, we conclude that p ^ ^{p, e) is quasi-concave on the closure 'D{Tfj,). 



K. Derivation of (|97J 

Rather than maximizing D{p, e), we minimize its reciprocal 



1 



1 



QiP, e) 



v{p,e) 



E^ 



riPi 



1 

Pi 



E' 



riPi 



nPi 



(199) 



In the last equality we have made use of the normalization Ei — 1- The above expression is a non-negatively weighted 
sum of reciprocals of pi (plus a positive constant), and thus a convex function of p. Therefore, we are dealing with a convex 
problem, for which a Lagrange approach yields necessary and sufficient optimality conditions. To minimize this convex function 
under the (convex) sum constraint J^iPi — MP' we define the Lagrangian 

1 



Lip, A) 



u{p, e) 



Xil'p-pr) 



(200) 
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and four associated Karush-Kuhn-Tucker conditions 

dp p=p^ 




= 



A > 




M-p < 0. 



With ( |199[ ), the stationarity condition dL/dp]^^^, — reads as 



(201) 



r1 



whence 




(202) 



Since we must have A > 0, the complementary slackness condition A(l^p — /ip) ~ requires that the inequality constraint 
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be fulfilled with equality, i.e., l^p — ft-p, hence VX = J2i fi^\l ^i{l^Q + ''O' 



so the solution ( |202| l reads as 
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